Emory 8

Overview:

This page contains the results of CoNGA analyses. Results in tables may have been filtered to reduce redundancy, focus on the most important columns, and limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.

Command:

scripts/run_conga.py --all --gex_data /scratch.global/ben_testing/ben_tcr/Pair_8_Emory/outs/filtered_feature_bc_matrix.h5 --gex_data_type 10x_h5 --clones_file emoryPair8Final.tsv --organism human --outfile_prefix emoryPair8Final

Stats

num_cells_w_gex: 13705
num_features_start: 26530
num_cells_w_tcr: 1227
min_genes_per_cell: 200
max_genes_per_cell: 3500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 912
num_filt_max_percent_mito: 0
num_antibody_features: 0
num_TR_genes: 43
num_TR_genes_in_hvg_set: 38
num_highly_variable_genes: 2269
num_cells_after_filtering: 315
num_clonotypes: 262
max_clonotype_size: 12
num_singleton_clonotypes: 236
nbr_frac_for_nndists: 0.1
num_gvg_hit_clonotypes: 6
num_gvg_hit_biclusters: 0

graph_vs_graph_stats


Here we are assessing overall graph-vs-graph correlation by looking at the shared edges between TCR and GEX neighbor graphs and comparing that observed number to the number we would expect if the graphs were completely uncorrelated. Our null model for uncorrelated graphs is to take the vertices of one graph and randomly renumber them (permute their labels). We compare the observed overlap to that expected under this null model by computing a Z-score, either by permuting one of the graph's vertices many times to get a mean and standard deviation of the overlap distribution, or, for large graphs where this is time consuming, by using a regression model for the standard deviation. The different rows of this table correspond to the different graph-graph comparisons that we make in the conga graph-vs-graph analysis: we compare K-nearest-neighbor graphs for GEX and TCR at different K values ("nbr_frac" aka neighbor-fraction, which reports K as a fraction of the total number of clonotypes) to each other and to GEX and TCR "cluster" graphs in which each clonotype is connected to all the other clonotypes with the same (GEX or TCR) cluster assignment. For two K values (the default), this gives 2*3=6 comparisons: GEX KNN graph vs TCR KNN graph, GEX cluster graph vs TCR KNN graph, and GEX KNN graph vs TCR cluster graph, for each of the two K values (aka nbr_fracs).

The column to look at is *overlap_zscore*. Higher values indicate more significant GEX/TCR covariation, with "interesting" levels starting around zscores of 3-5.

Columns in more detail:

graph_overlap_type: KNN ("nbr") or cluster versus KNN ("nbr") or cluster

nbr_frac: the K value for the KNN graph, as a fraction of total clonotypes

overlap: the observed overlap (number of shared edges) between GEX and TCR graphs

expected_overlap: the expected overlap under a shuffled null model.

overlap_zscore: a Z-score for the observed overlap computed by subtracting the expected overlap and dividing by the standard deviation estimated from shuffling.
overlap expected_overlap overlap_mean overlap_sdev overlap_zscore overlap_zscore_fitted overlap_zscore_source nodes calculation_time calculation_time_fitted gex_edges tcr_edges gex_indegree_variance gex_indegree_skewness gex_indegree_kurtosis tcr_indegree_variance tcr_indegree_skewness tcr_indegree_kurtosis indegree_correlation_R indegree_correlation_P nbr_frac graph_overlap_type
2 4.015326 4.09 2.020371 -1.034463 -1.694141 shuffling 262 0.050308 0.000231 524 524 1.643678 2.098540 5.672902 0.597701 0.878505 0.883814 0.035756 0.564493 0.01 gex_nbr_vs_tcr_nbr
94 84.659004 84.25 9.815676 0.993309 1.276800 shuffling 262 0.105123 0.005581 524 11048 1.643678 2.098540 5.672902 0.189351 -0.069373 -1.056194 -0.057329 0.355340 0.01 gex_nbr_vs_tcr_cluster
97 90.789272 89.61 9.756941 0.757410 1.143899 shuffling 262 0.108160 0.006004 11848 524 0.132700 -0.676264 -1.155820 0.597701 0.878505 0.883814 -0.070397 0.256193 0.01 gex_cluster_vs_tcr_nbr
706 678.590038 678.03 32.549487 0.859307 0.902915 shuffling 262 0.091219 0.039407 6812 6812 1.125553 1.721833 3.070619 0.259437 1.443268 3.280935 -0.004940 0.936571 0.10 gex_nbr_vs_tcr_nbr
1124 1100.567050 1097.45 41.214652 0.644188 0.598449 shuffling 262 0.115277 0.065300 6812 11048 1.125553 1.721833 3.070619 0.189351 -0.069373 -1.056194 -0.019091 0.758411 0.10 gex_nbr_vs_tcr_cluster
1275 1180.260536 1184.02 40.203229 2.263002 3.659651 shuffling 262 0.117067 0.070246 11848 6812 0.132700 -0.676264 -1.155820 0.259437 1.443268 3.280935 -0.011697 0.850540 0.10 gex_cluster_vs_tcr_nbr

graph_vs_graph


Graph vs graph analysis looks for correlation between GEX and TCR space by finding statistically significant overlap between two similarity graphs, one defined by GEX similarity and one by TCR sequence similarity.

Overlap is defined one node (clonotype) at a time by looking for overlap between that node's neighbors in the GEX graph and its neighbors in the TCR graph. The null model is that the two neighbor sets are chosen independently at random.

CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where K = neighborhood size is specified as a fraction of the number of clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where each clonotype is connected to all the other clonotypes in the same (GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN, GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the K values (called nbr_fracs short for neighbor fractions).

Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster): conga_score = P value for GEX/TCR overlap * number of clonotypes mait_fraction = fraction of the overlap made up of 'invariant' T cells num_neighbors* = size of neighborhood (K) cluster_size = size of cluster (for KNN v cluster graph overlaps) clone_index = 0-index of clonotype in adata object


conga_score num_neighbors_gex num_neighbors_tcr overlap overlap_corrected mait_fraction clone_index nbr_frac graph_overlap_type cluster_size gex_cluster tcr_cluster va ja cdr3a vb jb cdr3b
0.400422 26.0 26.0 8 8 0.0 100 0.1 gex_nbr_vs_tcr_nbr NaN 3 4 TRAV19*01 TRAJ57*01 CALNEGWKGGSEKLVF TRBV12-2*01 TRBJ1-2*01 CASSFTGGSSYDYTF
0.628428 26.0 NaN 9 9 0.0 43 0.1 gex_nbr_vs_tcr_cluster 34.0 1 3 TRAV13-1*01 TRAJ28*01 CAAATYSGAGSYQLTF TRBV20-1*01 TRBJ1-4*01 CSATQGGGKLFF
0.693517 26.0 NaN 7 7 0.0 209 0.1 gex_nbr_vs_tcr_cluster 22.0 3 4 TRAV8-3*01 TRAJ11*01 CAVSDLGYSTLTF TRBV24-1*01 TRBJ1-5*01 CATSQSRIRQPQYF
0.825420 NaN 26.0 12 12 0.0 13 0.1 gex_cluster_vs_tcr_nbr 57.0 1 3 TRAV12-1*01 TRAJ45*01 CAVRLSANRLTF TRBV20-1*01 TRBJ2-5*01 CSALAYRETQYF
0.825420 NaN 26.0 12 12 0.0 196 0.1 gex_cluster_vs_tcr_nbr 57.0 1 3 TRAV6*01 TRAJ18*01 CALDMRVRGSTLGKLYF TRBV20-1*01 TRBJ1-1*01 CSADRSGGITEAFF
0.895093 NaN 26.0 8 8 0.0 160 0.1 gex_cluster_vs_tcr_nbr 29.0 3 4 TRAV38-1*01 TRAJ32*01 CAFMKHLGGYGGSGNKLIF TRBV13*01 TRBJ1-2*01 CASSPGYRPNYDYTF

tcr_clumping


This table stores the results of the TCR "clumping" analysis, which looks for neighborhoods in TCR space with more TCRs than expected by chance under a simple null model of VDJ rearrangement.

For each TCR in the dataset, we count how many TCRs are within a set of fixed TCRdist radii (defaults: 24,48,72,96), and compare that number to the expected number given the size of the dataset using the poisson model. Inspired by the ALICE and TCRnet methods.

Columns: clump_type='global' unless we are optionally looking for TCR clumps within the individual GEX clusters num_nbrs = neighborhood size (number of other TCRs with TCRdist


tcr_db_match


This table stores significant matches between TCRs in adata and TCRs in the file /scratch.global/ben_testing/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv

P values of matches are assigned by turning the raw TCRdist score into a P value based on a model of the V(D)J rearrangement process, so matches between TCRs that are very far from germline (for example) are assigned a higher significance.

Columns:

tcrdist: TCRdist distance between the two TCRs (adata query and db hit)

pvalue_adj: raw P value of the match * num query TCRs * num db TCRs

fdr_value: Benjamini-Hochberg FDR value for match

clone_index: index within adata of the query TCR clonotype

db_index: index of the hit in the database being matched

va,ja,cdr3a,vb,jb,cdr3b

db_XXX: where XXX is a field in the literature database



tcr_graph_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction nbr_frac graph_type feature_type
2.244313e-17 5.018759e-25 6.143825 1 3 ENSMMUG00000043894 3.022686 0.244070 35 -1 0.0 0.00 tcr_cluster gex
7.455134e-03 5.542615e-24 6.143828 0 1 ENSMMUG00000060662 1.944040 0.081275 27 240 0.0 0.10 tcr_nbr gex
4.682479e-02 4.534282e-21 5.903432 0 1 ENSMMUG00000060662 1.879571 0.088682 27 238 0.0 0.10 tcr_nbr gex
2.781446e-01 6.682292e-19 5.722198 5 1 ENSMMUG00000060662 1.829304 0.094458 27 243 0.0 0.10 tcr_nbr gex
2.888083e-01 1.830371e-18 5.641075 0 1 ENSMMUG00000060662 1.806370 0.097093 27 230 0.0 0.10 tcr_nbr gex
2.850508e-01 2.871732e-18 5.587451 0 1 ENSMMUG00000060662 1.791071 0.098851 27 225 0.0 0.10 tcr_nbr gex
3.566758e-01 4.967101e-18 5.475248 0 1 ENSMMUG00000060662 1.758713 0.102568 27 228 0.0 0.10 tcr_nbr gex
3.442440e-01 5.220053e-18 5.483158 0 1 ENSMMUG00000060662 1.761009 0.102304 27 219 0.0 0.10 tcr_nbr gex
3.458164e-01 6.689329e-18 5.454757 0 1 ENSMMUG00000060662 1.752754 0.103253 27 223 0.0 0.10 tcr_nbr gex
1.779658e+00 6.207490e-16 5.277266 5 1 ENSMMUG00000060662 1.700546 0.109251 27 227 0.0 0.10 tcr_nbr gex
1.699862e+00 1.202008e-15 5.223451 2 1 ENSMMUG00000060662 1.684516 0.111093 27 234 0.0 0.10 tcr_nbr gex
1.482734e+00 1.450578e-15 5.281179 0 1 ENSMMUG00000060662 1.701708 0.109118 27 239 0.0 0.10 tcr_nbr gex
1.432099e+00 1.749898e-15 5.251437 0 1 ENSMMUG00000060662 1.692864 0.110134 27 224 0.0 0.10 tcr_nbr gex
1.458250e+00 2.013772e-15 5.250125 0 1 ENSMMUG00000060662 1.692473 0.110179 27 237 0.0 0.10 tcr_nbr gex
2.926445e-02 4.245694e-14 4.892664 0 1 ENSMMUG00000060662 1.185828 0.073744 47 -1 0.0 0.00 tcr_cluster gex
6.908012e+00 5.446651e-13 4.922364 0 1 ENSMMUG00000060662 1.593302 0.121573 27 218 0.0 0.10 tcr_nbr gex
7.862132e+00 8.816225e-13 4.873058 0 1 ENSMMUG00000060662 1.578143 0.123315 27 229 0.0 0.10 tcr_nbr gex
1.255972e-04 5.976196e-12 5.108541 1 3 ENSMMUG00000043894 2.787112 0.365727 27 121 0.0 0.10 tcr_nbr gex
1.189078e-03 2.640949e-11 4.851444 4 3 ENSMMUG00000043894 2.663305 0.379952 27 81 0.0 0.10 tcr_nbr gex
2.735097e-04 3.697337e-11 4.908379 1 3 ENSMMUG00000043894 2.690738 0.376800 27 196 0.0 0.10 tcr_nbr gex
3.475505e-03 8.382899e-10 4.772083 4 3 ENSMMUG00000043894 2.625058 0.384346 27 147 0.0 0.10 tcr_nbr gex
6.501781e-03 1.058586e-09 4.711510 2 3 ENSMMUG00000043894 2.595863 0.387700 27 214 0.0 0.10 tcr_nbr gex
4.080873e-03 1.089854e-09 4.775729 1 3 ENSMMUG00000043894 2.626815 0.384144 27 18 0.0 0.10 tcr_nbr gex
3.407767e-03 1.945689e-09 4.730350 4 3 ENSMMUG00000043894 2.604943 0.386657 27 24 0.0 0.10 tcr_nbr gex
3.842020e-03 2.121496e-09 4.723664 1 3 ENSMMUG00000043894 2.601721 0.387027 27 118 0.0 0.10 tcr_nbr gex
2.045944e-02 3.264041e-09 4.555572 1 3 ENSMMUG00000043894 2.520710 0.396335 27 184 0.0 0.10 tcr_nbr gex
4.371723e-03 6.842264e-09 4.633070 1 3 ENSMMUG00000043894 2.558056 0.392044 27 15 0.0 0.10 tcr_nbr gex
2.164265e+00 2.255201e-06 4.111250 1 3 ENSMMUG00000043894 2.307001 0.420889 27 64 0.0 0.10 tcr_nbr gex
2.079656e+00 2.912219e-06 4.080762 4 3 ENSMMUG00000043894 2.292380 0.422569 27 74 0.0 0.10 tcr_nbr gex
4.155404e-01 3.661825e-06 4.255085 1 3 ENSMMUG00000043894 2.376070 0.412953 27 22 0.0 0.10 tcr_nbr gex
1.744567e+00 4.961917e-06 4.109200 1 3 ENSMMUG00000043894 2.306017 0.421002 27 130 0.0 0.10 tcr_nbr gex
4.828658e-01 6.711579e-06 4.175188 1 3 ENSMMUG00000043894 2.337686 0.417363 27 110 0.0 0.10 tcr_nbr gex
2.536608e+00 8.200559e-06 4.010442 1 3 ENSMMUG00000043894 2.258687 0.426440 27 68 0.0 0.10 tcr_nbr gex
1.888240e+00 1.105923e-05 4.017658 1 3 ENSMMUG00000043894 2.262142 0.426043 27 28 0.0 0.10 tcr_nbr gex
5.773776e-01 1.317929e-05 5.000747 2 5 TRIM23 0.780453 0.036267 3 174 0.0 0.01 tcr_nbr gex
1.039705e+00 3.031310e-05 3.988643 1 3 ENSMMUG00000043894 2.248252 0.427639 27 79 0.0 0.10 tcr_nbr gex
4.565668e+00 3.464854e-05 4.025198 1 3 ENSMMUG00000043894 2.265754 0.425628 27 206 0.0 0.10 tcr_nbr gex
3.560770e-73 5.110138e-05 5.530760 1 7 ENSMMUG00000056910 1.700533 0.092432 3 62 0.0 0.01 tcr_nbr gex
7.945207e+00 7.666406e-05 3.922797 1 3 ENSMMUG00000043894 2.216758 0.431257 27 250 0.0 0.10 tcr_nbr gex
3.106427e+00 1.148110e-04 3.982158 1 3 ENSMMUG00000043894 2.245148 0.427995 27 119 0.0 0.10 tcr_nbr gex
6.064211e+00 2.795210e-04 3.839704 1 3 ENSMMUG00000043894 2.177083 0.435816 27 43 0.0 0.10 tcr_nbr gex
2.521144e-28 1.952129e-02 4.030483 4 2 ZNF48 0.569399 0.045879 3 23 0.0 0.01 tcr_nbr gex
9.299052e-01 2.146723e-02 3.877036 4 4 BAG4 0.575650 0.051615 3 91 0.0 0.01 tcr_nbr gex
1.302097e+00 3.562136e-02 4.029565 0 2 C13H2orf15 0.646091 0.054112 3 21 0.0 0.01 tcr_nbr gex
1.668233e-03 1.864302e-01 6.022600 0 3 ENSMMUG00000060662 2.935597 0.242402 3 232 0.0 0.01 tcr_nbr gex
6.000005e-05 4.352498e-01 3.897599 2 5 BRWD3 0.780453 0.076350 3 174 0.0 0.01 tcr_nbr gex
1.149514e-110 4.611845e-01 3.790806 3 4 ENSMMUG00000049532 0.629644 0.061435 3 94 0.0 0.01 tcr_nbr gex
1.239246e-04 5.026338e-01 3.732104 4 0 AKT1 0.592910 0.059116 3 155 0.0 0.01 tcr_nbr gex
3.231656e-01 7.589127e-01 5.612850 1 1 ENSMMUG00000060662 2.681532 0.245344 3 219 0.0 0.01 tcr_nbr gex
1.487335e-98 1.141779e+00 3.527302 3 4 CHERP 0.629644 0.073305 3 94 0.0 0.01 tcr_nbr gex
Omitted 10 lines

tcr_graph_vs_gex_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: emoryPair8Final_tcr_graph_vs_gex_features_plot.png

tcr_graph_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: emoryPair8Final_tcr_graph_vs_gex_features_panels.png

tcr_genes_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.

In this analysis the TCR graph is defined by connecting all clonotypes that have the same VA/JA/VB/JB-gene segment (it's run four times, once with each gene segment type)
ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction gene_segment graph_type feature_type
6.269412e-17 4.853489e-46 9.062205 0 3 ENSMMUG00000060662 2.554212 0.021946 26 -1 0.0 TRAV8-7 tcr_genes gex
2.999463e-08 3.610592e-44 10.043410 1 2 ENSMMUG00000062211 3.326379 0.025114 15 -1 0.0 TRBV12-2 tcr_genes gex
6.511641e-10 1.178803e-31 8.973724 5 2 ENSMMUG00000062085 3.317979 0.051564 15 -1 0.0 TRBV4-3 tcr_genes gex
1.779122e-46 1.488535e-29 6.679359 1 3 ENSMMUG00000043894 3.228590 0.212323 35 -1 0.0 TRBV20-1 tcr_genes gex
1.868253e+00 4.198438e-28 10.208363 1 2 ENSMMUG00000063185 3.207270 0.019844 6 -1 0.0 TRBV4-2 tcr_genes gex
3.882604e+00 4.330226e-26 7.986221 5 3 ENSMMUG00000059325 1.954375 0.023616 13 -1 0.0 TRAV25 tcr_genes gex
1.531591e+00 1.778148e-11 7.778437 3 5 ENSMMUG00000051385 3.355458 0.118651 6 -1 0.0 TRBV7-4 tcr_genes gex
1.261532e-01 3.482693e-09 7.808343 2 0 ENSMMUG00000051385 3.462733 0.129158 5 -1 0.0 TRBV7-6 tcr_genes gex
1.541306e-01 2.908451e-05 5.660920 2 1 ENSMMUG00000056515 3.238345 0.394789 9 -1 0.0 TRBV6-3 tcr_genes gex
8.653139e-02 5.763194e-02 4.572084 0 6 ENSMMUG00000043894 2.899598 0.543311 8 -1 0.0 TRBV21-1 tcr_genes gex
8.744927e+00 6.177033e-02 3.117088 4 4 ENSMMUG00000056515 1.773075 0.446913 9 -1 0.0 TRBV10-1 tcr_genes gex
2.863030e-01 1.179343e-01 4.100708 1 7 ENSMMUG00000056515 2.353249 0.441388 7 -1 0.0 TRBV9 tcr_genes gex
2.837480e-01 1.199131e-01 6.194817 5 0 ENSMMUG00000056515 3.698022 0.430103 5 -1 0.0 TRBV6-2 tcr_genes gex
2.583847e-12 2.701260e+00 2.769026 2 0 GPATCH2 0.590825 0.111690 4 -1 0.0 TRAV35 tcr_genes gex
4.150764e-11 8.588965e+00 2.750653 2 0 MANBA 0.590825 0.113044 4 -1 0.0 TRAV35 tcr_genes gex
8.748592e-01 9.787085e+00 1.029846 0 0 MATR3 1.264374 0.808449 24 -1 0.0 TRBJ1-4 tcr_genes gex

tcr_genes_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: emoryPair8Final_tcr_genes_vs_gex_features_panels.png

gex_graph_vs_tcr_features


This table has results from a graph-vs-features analysis in which we look at the distribution of a set of TCR-defined features over the GEX neighbor graph. We look for neighborhoods in the graph that have biased score distributions, as assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a tcr feature.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons ttest_stat= ttest statistic (sign indicates where feature is up or down) mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the TCR score mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj ttest_stat mwu_pvalue_adj gex_cluster tcr_cluster num_fg mean_fg mean_bg feature mait_fraction clone_index nbr_frac graph_type feature_type
1.808341e-118 44.463843 0.149465 2.0 2.0 3.0 1.000000 0.115830 TRBJ2-1 0.0 33 0.01 gex_nbr tcr
7.525556e-115 42.863519 0.305571 5.0 3.0 3.0 1.000000 0.123552 TRBV20-1 0.0 68 0.01 gex_nbr tcr
7.525556e-115 42.863519 0.305571 1.0 3.0 3.0 1.000000 0.123552 TRBV20-1 0.0 121 0.01 gex_nbr tcr
7.525556e-115 42.863519 0.305571 4.0 3.0 3.0 1.000000 0.123552 TRBV20-1 0.0 242 0.01 gex_nbr tcr
2.572694e-01 5.130493 1.547852 1.0 0.0 27.0 -0.047797 -0.307910 kf6 0.0 22 0.10 gex_nbr tcr
2.966857e-01 -5.061649 2.252206 4.0 3.0 27.0 -2.182790 -1.198783 imhc 0.0 246 0.10 gex_nbr tcr

gex_graph_vs_tcr_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_gex_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: emoryPair8Final_gex_graph_vs_tcr_features_plot.png

gex_graph_vs_tcr_features_panels


Image source: emoryPair8Final_gex_graph_vs_tcr_features_panels.png
ERROR -- missing image {pngfile}

graph_vs_features_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=26 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: emoryPair8Final_graph_vs_features_gex_clustermap.png

graph_vs_features_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=26 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: emoryPair8Final_graph_vs_features_tcr_clustermap.png

graph_vs_summary


Summary figure for the graph-vs-graph and graph-vs-features analyses.
Image source: emoryPair8Final_graph_vs_summary.png

gex_clusters_tcrdist_trees


These are TCRdist hierarchical clustering trees for the GEX clusters (cluster assignments stored in adata.obs['clusters_gex']). The trees are colored by CoNGA score with a color score range of 2.62e+00 (blue) to 2.62e-09 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: emoryPair8Final_gex_clusters_tcrdist_trees.png

conga_threshold_tcrdist_tree


This is a TCRdist hierarchical clustering tree for the clonotypes with CoNGA score less than 10.0. The tree is colored by CoNGA score with a color score range of 1.00e+01 (blue) to 1.00e-08 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: emoryPair8Final_conga_threshold_tcrdist_tree.png

hotspot_features


Find GEX (TCR) features that show a biased distribution across the TCR (GEX) neighbor graph, using a simplified version of the Hotspot method from the Yosef lab.

DeTomaso, D., & Yosef, N. (2021). "Hotspot identifies informative gene modules across modalities of single-cell genomics." Cell Systems, 12(5), 446–456.e9.

PMID:33951459

Columns:

Z: HotSpot Z statistic

pvalue_adj: Raw P value times the number of tests (crude Bonferroni correction)

nbr_frac: The K NN nbr fraction used for the neighbor graph construction (nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)


Z pvalue_adj feature feature_type nbr_frac
31.600037 2.721964e-215 ENSMMUG00000060662 gex 0.10
29.992100 9.182532e-194 ENSMMUG00000043894 gex 0.10
13.770920 2.814559e-39 ENSMMUG00000062085 gex 0.10
13.705246 6.971152e-39 ENSMMUG00000060662 gex 0.01
13.146993 1.307013e-35 ENSMMUG00000043894 gex 0.01
12.224595 1.695591e-30 ENSMMUG00000062211 gex 0.10
11.804036 2.745959e-28 ENSMMUG00000056910 gex 0.10
9.669365 3.003285e-18 ENSMMUG00000059325 gex 0.10
9.167617 3.569391e-16 ENSMMUG00000056910 gex 0.01
8.977690 2.040930e-15 ENSMMUG00000056515 gex 0.10
8.938303 2.916925e-15 ENSMMUG00000013725 gex 0.01
8.899772 4.130520e-15 ARHGAP24 gex 0.01
7.093737 9.635035e-09 ENSMMUG00000059325 gex 0.01
6.688777 1.661048e-07 ENSMMUG00000061119 gex 0.01
5.886012 2.920119e-05 NRTN gex 0.01
5.878370 3.058139e-05 TNS2 gex 0.01
5.835053 3.969021e-05 ENSMMUG00000056783 gex 0.01
5.720882 7.821851e-05 GSTO2 gex 0.01
5.700405 8.821984e-05 ENSMMUG00000062085 gex 0.01
5.656015 1.143521e-04 LKAAEAR1 gex 0.01
5.259207 1.067868e-03 C4H6orf132 gex 0.01
5.148158 1.941622e-03 ENSMMUG00000061119 gex 0.10
4.909238 6.748512e-03 ENSMMUG00000052673 gex 0.01
4.853287 8.962868e-03 ENSMMUG00000062211 gex 0.01
4.805447 1.139671e-02 MORN4 gex 0.01
4.794499 1.203710e-02 FGD4 gex 0.01
4.753354 1.476717e-02 ENSMMUG00000063055 gex 0.01
4.678995 2.127831e-02 ENSMMUG00000056515 gex 0.01
3.501573 2.937006e-02 TRBV19 tcr 0.01
4.601451 3.096756e-02 PLCB1 gex 0.01

hotspot_gex_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the GEX UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: emoryPair8Final_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png

hotspot_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=26 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: emoryPair8Final_hotspot_combo_features_0.100_nbrs_gex_plot_clustermap_nbr_avg.png

hotspot_tcr_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the TCR UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: emoryPair8Final_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png

hotspot_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=26 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: emoryPair8Final_hotspot_combo_features_0.100_nbrs_tcr_plot_clustermap_nbr_avg.png